Learning-based Spam Filters: the Influence of the Temporal Distribution of Training Data

نویسنده

  • Anton Bryl
چکیده

The great number and variety of learning-based spam filters proposed during the last years cause the need in complex and many-sided evaluation of them, taking features of the phenomenon of spam into account. This paper is dedicated to the analysis of the dependence of filter performance on the temporal distribution of training data; the cause of this dependence is the changeability of email. Such analysis provides additional information about the filter quality, and also may be useful for organizing more effective training of the filter. The naı̈ve Bayes filter is chosen for evaluation in this paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

An Anti-spam Filter Combination Framework for Text-and-Image Emails through Incremental Learning

We present an anti-spam filtering framework that combines text-based and image-based anti-spam filters. First, an incremental learning approach to reducing mismatches between training and test datasets is proposed to resolve the problem of a lack of training data for legitimate emails that contain both text and images. Then, the outputs of text-based and image-based filters are combined with th...

متن کامل

Denial of Information Attacks in Event Processing

Automated Denial of Information Attacks. It is a common assumption in event processing that the events are “clean”, i.e., they come from well-behaved and trustworthy sources. This assumption does not hold in all major open communications media for several reasons. First, adversaries may spread massive noise data, e.g., in email spam. Second, adversaries may inject potentially interesting, but o...

متن کامل

Classifying Unsolicited Bulk Email (UBE) using Python Machine Learning Techniques

Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of spam emails during the past few years. As spammers always try to find a way to evade existing filters, new filters need to be developed to catch spam. Generally, the main tool for email filtering is based on text classification. A classifi...

متن کامل

ارائه روشی مناسب برای دسته بندی نامه های الکترونیکی تبلیغاتی بر مبنای پروفایل کاربران

In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising emails become spam for some users, and not spam for others. To deal with this problem, many researchers design an anti-s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006